A Semantic approach for Text Clustering using WordNet based on Multi-Objective Genetic Algorithms
نویسندگان
چکیده
In this paper, we propose a method of MultiObjective Genetic Algorithms (MOGAs), NSGA-II and SPEA2, for document clustering with semantic similarity measures based on WordNet. The MOGAs showed a high performance compared to other clustering algorithms. The main problem in the application of MOGAs for document clustering in the Vector Space Model (VSM) is that it ignores relationships between important terms or words. The hierarchical structure of WordNet as thesaurus-based ontology is an effective technique, which is used in semantic similarity measure. We tested these algorithms on Reuter-21578 collection data sets and compared them with Genetic Algorithms (GA) in conjunction with the semantic similarity measures based on WordNet. Also, we used F-measure to evaluate the performance of these clustering algorithms. The experimental results show that the performance of MOGAs based on WordNet is superior to those of the other clustering algorithms in the same similarity environments. Keywords— Document Clustering, Multi-Objective Genetic Algorithm, Semantic Similarity Measure, WordNet
منابع مشابه
A Multi-Objective Approach to Fuzzy Clustering using ITLBO Algorithm
Data clustering is one of the most important areas of research in data mining and knowledge discovery. Recent research in this area has shown that the best clustering results can be achieved using multi-objective methods. In other words, assuming more than one criterion as objective functions for clustering data can measurably increase the quality of clustering. In this study, a model with two ...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملGenerating Optimal Timetabling for Lecturers using Hybrid Fuzzy and Clustering Algorithms
UCTTP is a NP-hard problem, which must be performed for each semester frequently. The major technique in the presented approach would be analyzing data to resolve uncertainties of lecturers’ preferences and constraints within a department in order to obtain a ranking for each lecturer based on their requirements within a department where it is attempted to increase their satisfaction and develo...
متن کاملUsing Metaheuristic Algorithms Combined with Clustering Approach to Solve a Sustainable Waste Collection Problem
Sustainability is a monumental issue that should be considered in designing a logistics system. In order to incorporate sustainability concepts in our study, a waste collection problem with economic, environmental, and social objective functions was addressed. The first objective function minimized overall costs of the system, including establishment of depots and treatment facilities. Addressi...
متن کاملAERO-THERMODYNAMIC OPTIMIZATION OF TURBOPROP ENGINES USING MULTI-OBJECTIVE GENETIC ALGORITHMS
In this paper multi-objective genetic algorithms were employed for Pareto approach optimization of turboprop engines. The considered objective functions are used to maximize the specific thrust, propulsive efficiency, thermal efficiency, propeller efficiency and minimize the thrust specific fuel consumption. These objectives are usually conflicting with each other. The design variables consist ...
متن کامل